12,732 research outputs found
Recommended from our members
Reliability Assessment of Legacy Safety-Critical Systems Upgraded with Fault-Tolerant Off-the-Shelf Software
This paper presents a new way of applying Bayesian assessment to systems, which consist of many components. Full Bayesian inference with such systems is problematic, because it is computationally hard and, far more seriously, one needs to specify a multivariate prior distribution with many counterintuitive dependencies between the probabilities of component failures. The approach taken here is one of decomposition. The system is decomposed into partial views of the systems or part thereof with different degrees of detail and then a mechanism of propagating the knowledge obtained with the more refined views back to the coarser views is applied (recalibration of coarse models). The paper describes the recalibration technique and then evaluates the accuracy of recalibrated models numerically on contrived examples using two techniques: u-plot and prequential likelihood, developed by others for software reliability growth models. The results indicate that the recalibrated predictions are often more accurate than the predictions obtained with the less detailed models, although this is not guaranteed. The techniques used to assess the accuracy of the predictions are accurate enough for one to be able to choose the model giving the most accurate prediction
Recommended from our members
Assessing Asymmetric Fault-Tolerant Software
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. N-version programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these methods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted
Recommended from our members
Improving DBMS performance through diverse redundancy
Database replication is widely used to improve both fault tolerance and DBMS performance. Non-diverse database replication has a significant limitation - it is effective against crash failures only. Diverse redundancy is an effective mechanism of tolerating a wider range of failures, including many non-crash failures. However it has not been adopted in practice because many see DBMS performance as the main concern. In this paper we show experimental evidence that diverse redundancy (diverse replication) can bring benefits in terms of DBMS performance, too. We report on experimental results with an optimistic architecture built with two diverse DBMSs under a load derived from TPC-C benchmark, which show that a diverse pair performs faster not only than non-diverse pairs but also than the individual copies of the DBMSs used. This result is important because it shows potential for DBMS performance better than anything achievable with the available off-the-shelf servers
Recommended from our members
Rephrasing rules for off-the-shelf SQL database servers
We have reported previously (Gashi et al., 2004) results of a study with a sample of bug reports from four off-the-shelf SQL servers. We checked whether these bugs caused failures in more than one server. We found that very few bugs caused failures in two servers and none caused failures in more than two. This would suggest a fault-tolerant server built with diverse off-the-shelf servers would be a prudent choice for improving failure detection. To study other aspects of fault tolerance, namely failure diagnosis and state recovery, we have studied the "data diversity" mechanism and we defined a number of SQL rephrasing rules. These rules transform a client sent statement to an additional logically equivalent statement, leading to more results being returned to an adjudicator. These rules therefore help to increase the probability of a correct response being returned to a client and maintain a correct state in the database
Recommended from our members
Uncertainty explicit assessment of off-the-shelf software: Selection of an optimal diverse pair
Assessment of software COTS components is an essential part of component-based software development. Sub-optimal selection of components may lead to solutions with low quality. The assessment is based on incomplete knowledge about the COTS components themselves and other aspects, which may affect the choice such as the vendor's credentials, etc. We argue in favor of assessment methods in which uncertainty is explicitly represented (`uncertainty explicit' methods) using probability distributions. We have adapted a model (developed elsewhere by Littlewood, B. et al. (2000)) for assessment of a pair of COTS components to take account of the fault (bug) logs that might be available for the COTS components being assessed. We also provide empirical data from a study we have conducted with off-the-shelf database servers, which illustrate the use of the method
Recommended from our members
The effect of testing on reliability of fault-tolerant software
Previous models have investigated the impact upondiversity - and hence upon the reliability of fault-tolerantsoftware built from 'diverse' versions - of the variation in'difficulty' of demands over the demand space. Thesemodels are essentially static, taking a single snapshotview of the system. In this paper we consider ageneralisation in which the individual versions areallowed to evolve - and their reliability to grow - throughdebugging. In particular, we examine the trade-off thatoccurs in testing between, on the one hand, the increasingreliability of individual versions, and on the other handthe possible diminution of diversity
Choosing effective methods for design diversity - How to progress from intuition to science
Design diversity is a popular defence against design faults in safety critical systems. Design diversity is at times pursued by simply isolating the development teams of the different versions, but it is presumably better to "force" diversity, by appropriate prescriptions to the teams. There are many ways of forcing diversity. Yet, managers who have to choose a cost-effective combination of these have little guidance except their own intuition. We argue the need for more scientifically based recommendations, and outline the problems with producing them. We focus on what we think is the standard basis for most recommendations: the belief that, in order to produce failure diversity among versions, project decisions should aim at causing "diversity" among the faults in the versions. We attempt to clarify what these beliefs mean, in which cases they may be justified and how they can be checked or disproved experimentally
Assessing the Reliability of Diverse Fault-Tolerant Systems
Design diversity between redundant channels is a way of improving the dependability of software-based systems, but it does not alleviate the difficulties of dependability assessment
Recommended from our members
An Empirical Study of the Effectiveness of 'Forcing Diversity' Based on a Large Population of Diverse Programs
Use of diverse software components is a viable defence against common-mode failures in redundant softwarebased systems. Various forms of "Diversity-Seeking Decisions" (“DSDs”) can be applied to the process of developing, or procuring, redundant components, to improve the chances of the resulting components not failing on the same demands. An open question is how effective these decisions, and their combinations, are for achieving large enough reliability gains. Using a large population of software programs, we studied experimentally the effectiveness of specific "DSDs" (and their combinations) mandating differences between redundant components. Some of these combinations produced much better improvements in system probability of failure per demand (PFD) than "uncontrolled" diversity did. Yet, our findings suggest that the gains from such "DSDs" vary significantly between them and between the application problems studied. The relationship between DSDs and system PFD is complex and does not allow for simple universal rules
(e.g. "the more diversity the better") to apply
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
- …